Improved Lite Audio-Visual Speech Enhancement

نویسندگان

چکیده

Numerous studies have investigated the effectiveness of audio-visual multimodal learning for speech enhancement (AVSE) tasks, seeking a solution that uses visual data as auxiliary and complementary input to reduce noise noisy signals. Recently, we proposed lite (LAVSE) algorithm car-driving scenario. Compared conventional AVSE systems, LAVSE requires less online computation some extent solves user privacy problem on facial data. In this study, extend improve its ability address three practical issues often encountered in implementing namely, additional cost processing data, asynchronization, low-quality The system is termed improved (iLAVSE), which convolutional recurrent neural network architecture core model. We evaluate iLAVSE Taiwan Mandarin with video dataset. Experimental results confirm compared can effectively overcome aforementioned performance. also suitable real-world scenarios, where high-quality sensors may not always be available.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audio Visual Speech Enhancement

This thesis presents a novel approach to speech enhancement by exploiting the bimodality of speech production and the correlation that exists between audio and visual speech information. An analysis into the correlation of a range of audio and visual features reveals significant correlation to exist between visual speech features and audio filterbank features. The amount of correlation was also...

متن کامل

Audio-visual enhancement of speech in noise.

A key problem for telecommunication or human-machine communication systems concerns speech enhancement in noise. In this domain, a certain number of techniques exist, all of them based on an acoustic-only approach--that is, the processing of the audio corrupted signal using audio information (from the corrupted signal only or additive audio information). In this paper, an audio-visual approach ...

متن کامل

Inventory-Based Audio-Visual Speech Enhancement

In this paper we propose to combine audio-visual speech recognition with inventory-based speech synthesis for speech enhancement. Unlike traditional filtering-based speech enhancement, inventory-based speech synthesis avoids the usual trade-off between noise reduction and consequential speech distortion. For this purpose, the processed speech signal is composed from a given speech inventory whi...

متن کامل

Audio-visual speech enhancement with AVCDCN (audio-visual codebook dependent cepstral normalization)

In this paper, we introduce a non-linear enhancement technique called Audio-Visual Codebook Dependent Cepstral Normalization (AVCDCN) and we consider its use with both audio-only and audio-visual speech recognition. AVCDCN is inspired from CDCN [1] [2], an audio-only enhancement technique that approximates the non-linear effect of noise on speech with a piece-wise constant function. Our experim...

متن کامل

Introducing the Turbo-Twin-HMM for Audio-Visual Speech Enhancement

Models for automatic speech recognition (ASR) hold detailed information about spectral and spectro-temporal characteristics of clean speech signals. Using these models for speech enhancement is desirable and has been the target of past research efforts. In such model-based speech enhancement systems, a powerful ASR is imperative. To increase the recognition rates especially in low-SNR condition...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing

سال: 2022

ISSN: ['2329-9304', '2329-9290']

DOI: https://doi.org/10.1109/taslp.2022.3153265